A Hybrid Framework Bridging Locality Analysis and Cache-Aware Scheduling for CMPs

نویسنده

  • Xipeng Shen
چکیده

Industry is rapidly moving towards the adoption of Chip Multi-Processors (CMPs). The sharing of memory hierarchy becomes deeper and heterogeneous. Without a good understanding of the sharing, most current systems schedule processes in a contention-oblivious way, causing systems severely underutilized with sub-optimal throughput and cache thrashing. In this report, we propose a three-stage framework to analyze shared-cache locality. It is based on inclusive locality model and unifies online and offline adaptive analysis. Unlike previous methods, inclusive locality model addresses all factors of cache contention at the same time. The goal is to produce a comprehensive understanding of the relations between program characteristics and run-time behavior in shared-cache systems, meanwhile developing a scalable adaptive contention-aware scheduling system. The preliminary experiments demonstrate the potential benefits of contention-aware scheduling on CMPs, and the promise of accurate runtime locality measurement, a critical component of the locality analysis framework.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CAPS: Contention-Aware Proactive Scheduling for CMPs

Many Chip Multiprocessors (CMPs) rely on shared caches to hide the latency of inter-thread communications as well as to improve effective memory bandwidth. Yet along comes cache contention, which often results in cache thrashing and severe performance degradation. Because of the variety of programs, a suitable schedule can often alleviate the issues significantly. However, it remains an open qu...

متن کامل

A Study of the Potential of Locality-Aware Thread Scheduling for GPUs

Programming models such as CUDA and OpenCL allow the programmer to specify the independence of threads, effectively removing ordering constraints. Still, parallel architectures such as the graphics processing unit (GPU) do not exploit the potential of data-locality enabled by this independence. Therefore, programmers are required to manually perform data-locality optimisations such as memory co...

متن کامل

Design and Implementation of a Cache Hierarchy-aware Task Scheduling for Parallel Loops on Multicore Architectures

Effective cache utilization is critical to performance in chip-multiprocessor systems (CMP). Modern CMP architectures are based on hierarchical cache topology with varying private and shared caches configurations at different levels. Cache-aware scheduling has become a great design challenge. Many scheduling strategies have been designed to target specific cache configuration. In this paper we ...

متن کامل

Cache-Aware Virtual Machine Scheduling on Multi-Core Architecture

Facing practical limits to increasing processor frequencies, manufacturers have resorted to multi-core designs in their commercial products. In multi-core implementations, cores in a physical package share the last-level caches to improve inter-core communication. To efficiently exploit this facility, operating systems must employ cache-aware schedulers. Unfortunately, virtualization software, ...

متن کامل

Locality Aware Work-Stealing based Scheduling in Hybrid CPU-GPU Clusters

We study work-stealing based scheduling on a cluster of nodes with CPUs and GPUs. In particular, we evaluate locality aware scheduling in the context of distributed shared memory style programming, where the user is oblivious to data placement. Our runtime maintains a distributed map of data resident on various nodes and uses it to estimate the affinity of work to different nodes to guide sched...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007